Natural Language Inference for Arabic Using Extended Tree Edit Distance with Subtrees

نویسندگان

  • Maytham Alabbas
  • Allan Ramsay
چکیده

Many natural language processing (NLP) applications require the computation of similarities between pairs of syntactic or semantic trees. Many researchers have used tree edit distance for this task, but this technique suffers from the drawback that it deals with single node operations only. We have extended the standard tree edit distance algorithm to deal with subtree transformation operations as well as single nodes. The extended algorithm with subtree operations, TED+ST, is more effective and flexible than the standard algorithm, especially for applications that pay attention to relations among nodes (e.g. in linguistic trees, deleting a modifier subtree should be cheaper than the sum of deleting its components individually). We describe the use of TED+ST for checking entailment between two Arabic text snippets. The preliminary results of using TED+ST were encouraging when compared with two string-based approaches and with the standard algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimising Tree Edit Distance with Subtrees for Textual Entailment

This paper introduces a method for improving tree edit distance (TED) for textual entailment. We explore two ways of improving TED: we extend the standard TED to use edit operations that apply to subtrees as well as to single nodes; and we use the ‘artificial bee colony’ algorithm (ABC) to estimate the cost of edit operations for single nodes and subtrees and to determine thresholds. The prelim...

متن کامل

A New Frequent Similar Tree Algorithm Motivated by Dom Mining - Using RTDM and its New Variant - SiSTeR

The importance of recognizing repeating structures in web applications has generated a large body of work on algorithms for mining the HTML Document Object Model (DOM). A restricted tree edit distance metric, called the Restricted Top Down Metric (RTDM), is most suitable for DOM mining as well as less computationally expensive than the general tree edit distance. Given two trees with input size...

متن کامل

Tree-based Hybrid Machine Translation

I present an automatic post-editing approach that combines translation systems which produce syntactic trees as output. The nodes in the generation tree and targetside SCFG tree are aligned and form the basis for computing structural similarity. Structural similarity computation aligns subtrees and based on this alignment, subtrees are substituted to create more accurate translations. Two diffe...

متن کامل

TASM: Top-k Approximate Subtree Matching

We consider the Top-k Approximate Subtree Matching (TASM) problem: finding the k best matches of a small query tree, e.g., a DBLP article with 15 nodes, in a large document tree, e.g., DBLP with 26M nodes, using the canonical tree edit distance as a similarity measure between subtrees. Evaluating the tree edit distance for large XML trees is difficult: the best known algorithms have cubic runti...

متن کامل

Analyzing Edit Distance on Trees: Tree Swap Distance is Intractable

The string correction problem looks at minimal ways to modify one string into another using fixed operations, such as for example inserting a symbol, deleting a symbol and interchanging the positions of two symbols (a “swap”). This has been generalized to trees in various ways, but unfortunately having operations to insert/delete nodes in the tree and operations that move subtrees, such as a “s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Artif. Intell. Res.

دوره 48  شماره 

صفحات  -

تاریخ انتشار 2013